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Abstract.  A  method  for  detecting  intrinsic  slow  variables  in  high-dimensional  stochastic  chemical  reaction  networks  is  developed 
and  analyzed.  It  combines  anisotropic  diffusion  maps  (ADM)  with  approximations  based  on  the  chemical  Langevin  equation  (CLE). 
The  resulting  approach,  called  ADM-CLE,  has  the  potential  of  being  more  efficient  than  the  ADM  method  for  a  large  class  of  chemical 
reaction  systems,  because  it  replaces  the  computationally  most  expensive  step  of  ADM  (running  local  short  bursts  of  simulations) 
by  using  an  approximation  based  on  the  CLE.  The  ADM-CLE  approach  can  be  used  to  estimate  the  stationary  distribution  of  the 
detected  slow  variable,  without  any  a-priori  knowledge  of  it.  If  the  conditional  distribution  of  the  fast  variables  can  be  obtained 
analytically,  then  the  resulting  ADM-CLE  approach  does  not  make  any  use  of  Monte  Carlo  simulations  to  estimate  the  distributions 
of  both  slow  and  fast  variables. 
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1.  Introduction.  The  time  evolution  of  a  complex  chemical  reaction  network  often  occurs  at  different  time 
scales,  and  the  observer  is  interested  in  tracking  the  evolution  of  the  slowly  evolving  quantities  (i.e. ,  of  the  so 
called  slow  variables)  as  opposed  to  recording  each  and  every  single  reaction  that  takes  place  in  the  system. 
Whenever  a  separation  of  scales  exists,  one  has  to  simulate  a  large  number  of  reactions  in  the  system  in  order  to 
capture  the  evolution  of  the  slowly  evolving  variables.  With  this  observation  in  mind,  it  becomes  crucial  to  be 
able  to  detect  and  parametrize  the  underlying  slow  manifold  corresponding  to  the  slow  variables  intrinsic  to  the 
system.  In  the  present  paper,  we  introduce  an  unsupervised  method  of  discovering  the  underlying  hidden  slow 
variables  in  chemical  reaction  networks,  and  of  their  stationary  distributions,  using  the  anisotropic  diffusion  map 
(ADM)  framework  [39]. 

The  ADM  is  a  special  class  of  diffusion  maps  which  have  gained  tremendous  popularity  in  machine  learning 
and  statistical  analysis,  as  a  robust  nonlinear  dimensionality  reduction  technique,  in  recent  years  [36,  2,  10,  6]. 
Diffusion  maps  have  been  successfully  used  as  a  manifold  learning  tool,  where  it  is  assumed  that  the  high 
dimensional  data  lies  on  a  lower  dimensional  manifold,  and  one  tries  to  capture  the  underlying  geometric  structure 
of  the  data,  a  setup  where  the  traditional  linear  dimensionality  reduction  techniques  (such  as  the  Principal 
Component  Analysis)  have  been  shown  to  fail.  In  the  diffusion  maps  setup,  one  constructs  or  is  given  a  sparse 
weighted  connected  graph  (usually  in  the  form  of  a  weighted  k-Nearest-Neighbor  graph,  with  each  node  connected 
only  to  its  k  nearest  or  most  similar  neighbors),  and  uses  it  to  build  the  associated  combinatorial  Laplacian 
L  =  D  —  W,  where  W  denotes  the  matrix  of  weights  and  D  denotes  a  diagonal  matrix  with  Da  equal  to  the  sum 
of  all  weights  of  the  node  i.  Next,  one  considers  the  generalized  eigenvalue  problem  Lx  =  A Dx,  whose  solutions 
are  related  to  the  solutions  of  the  eigenvalue  problem  Lx  =  Xx ,  where  L  =  D~XW  is  a  row-stochastic  matrix 
often  dubbed  as  the  random  walk  normalized  Laplacian.  Whenever  the  pair  (A,  x)  is  an  eigenvalue-eigenvector 
solution  to  Lx  =  Xx,  then  so  is  (1  —  A,  a;)  for  Lx  =  XDx.  The  (non-symmetric)  matrix  L  can  also  be  interpreted 
as  a  transition  probability  matrix  of  a  Markov  chain  with  state  space  given  by  the  nodes  of  the  graph,  and  entries 
Lij  denoting  the  one-step  transition  probability  from  node  i  to  j. 

In  the  diffusion  map  framework,  one  exploits  a  property  of  the  top  nontrivial  eigenvector  of  the  graph 
Laplacian  of  being  piecewise  constant  on  subsets  of  nodes  in  the  domain  that  correspond  to  the  same  state 
associated  to  the  underlying  slow  variable.  We  make  this  statement  precise  in  Section  4,  and  further  use  the 
resulting  classification  in  Section  5  to  propose  an  unsupervised  method  for  computing  the  stationary  distribution 
of  the  hidden  slow  variable,  without  using  any  prior  information  on  its  structure.  Since  the  top  eigenvectors 
of  the  above  Laplacian  define  the  coarsest  modes  of  variation  in  the  data,  and  have  a  natural  interpretation  in 
terms  of  diffusion  and  random  walks,  they  have  been  used  in  a  very  wide  range  of  applications,  including  but 
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not  limited  to  partitioning  [41,  40],  clustering  and  community  detection  [34,  45,  33],  image  segmentation  [38], 
ranking  [47,  17],  and  data  visualization  and  learning  from  data  [7,  36]. 

The  main  application  area  studied  in  this  paper  are  stochastic  models  of  chemical  reaction  networks.  They 
are  written  in  terms  of  stochastic  simulation  algorithms  (SSAs)  [21,  22]  which  have  been  used  to  model  a  number 
of  biological  systems,  including  the  phage  A  lysis-lysogeny  decision  circuit  [1],  circadian  rhythms  [44],  and  the 
cell  cycle  [27].  The  Gillespie  SSA  [21]  is  an  exact  stochastic  method  that  simulates  every  chemical  reaction, 
sampling  from  the  solution  of  the  corresponding  chemical  master  equation  (CME).  To  characterize  the  behavior 
of  a  chemical  system,  one  needs  to  simulate  a  large  number  of  reactions  and  realizations,  which  leads  to  very 
computationally  intensive  algorithms.  For  suitable  classes  of  chemically  reacting  systems,  one  can  sometimes 
use  exact  algorithms  which  are  equivalent  to  the  Gillespie  SSA,  but  are  less  computationally  intensive,  such  as 
the  Gibson-Bruck  SSA  [19]  and  the  Optimized  Direct  Method  [5].  However,  these  methods  also  stochastically 
simulate  the  occurrence  of  every  chemical  reaction,  which  can  be  a  computationally  challenging  task  for  systems 
with  a  very  large  number  of  species.  One  way  to  tackle  this  problem  is  to  use  parallel  stochastic  simulations  [28]. 
In  this  work,  we  discuss  an  alternative  approach  which  does  not  make  use  of  parallel  stochastic  simulations,  but 
at  the  same  time,  the  proposed  approach  can  also  benefit  from  large  processing  power  and  parallel  computing, 
as  many  steps  of  our  proposed  algorithms  are  highly  parallelizable. 

An  alternative  approach  to  treating  the  molecular  populations  as  discrete  random  variables,  is  to  describe 
them  in  terms  of  their  continuously  changing  concentration,  which  can  be  done  via  the  Chemical  Langevin 
equation  (CLE) ,  a  stochastic  differential  equation  that  links  the  discrete  stochastic  simulation  algorithm  with  the 
deterministic  reaction  rate  equations  [20].  Although  such  an  approach  can  be  less  computationally  expensive, 
it  comes  with  the  disadvantage  that,  for  certain  chemical  systems,  it  can  lead  to  negative  populations  [46].  In 
addition,  note  that  none  of  the  above  approaches  takes  explicit  advantage  of  the  separation  of  scales  if  one  exists, 
a  which  wc  will  make  use  of  in  this  paper  as  detailed  in  Sections  4  and  5. 

It  is  often  the  case  that  a  modeller  is  not  interested  in  every  single  reaction  which  takes  place  in  the  system, 
but  only  in  the  slowly  evolving  quantities.  Certain  systems  possess  multiple  time  scales,  meaning  that  one  has  to 
simulate  a  large  number  of  reactions  to  reveal  the  slow  dynamics.  Several  algorithms  for  chemical  networks  with 
fast  and  slow  variables  have  already  been  developed  in  the  literature.  The  authors  of  [25]  proposed  to  simulate 
the  fast  reactions  using  Langevin  dynamics,  and  the  slow  reactions  using  the  Gillespie  algorithm.  This  approach 
requires  both  the  time  scale  separation  and  a  sufficiently  large  system  volume;  however  the  latter  constraint  can  be 
avoided  using  probability  densities  of  the  fast  species  conditioned  on  the  slow  species,  and  estimating  the  effective 
propensity  functions  of  the  slow  species  [3,  4,  13,  37,  43].  An  alternative  approach  to  simulating  the  evolution  of 
the  slow  variables  while  avoiding  doing  so  for  the  fast  variables,  is  to  estimate  the  probability  distribution  of  the 
slow  variables  [18].  The  key  point  in  this  approach  is  to  use  short  bursts  of  appropriately  initialized  stochastic 
simulations  to  estimate  the  drift  and  diffusion  coefficients  for  an  approximating  Fokker-Planck  equation  written 
in  terms  of  the  slow  variables  [15].  The  success  of  this  approach  has  already  been  demonstrated  in  a  range  of 
applications  including  materials  science  [24],  cell  motility  [14],  and  social  behavior  of  insects  [42], 

Ref.  [8]  introduces  the  conditional  stochastic  simulation  algorithm  (CSSA)  that  allows  one  to  sample  effi¬ 
ciently  from  the  distribution  of  the  fast  variables  conditioned  on  the  slow  ones  [8] ,  and  to  estimate  the  coefficients 
of  the  effective  stochastic  differential  equation  (SDE)  on  the  fly  via  a  proposed  constrained  multiscale  algorithm 
(CMA)  algorithm.  The  CMA  can  be  further  modified  by  estimating  the  drift  and  diffusion  coefficients  in  the 
form  given  by  the  CLE  for  the  slow  subsystem,  which  requires  the  estimation  of  effective  propensity  functions  of 
slow  reactions  [9] .  The  main  question  we  plan  to  address  in  our  present  work  builds  on  and  combines  two  already 
existing  ideas  investigated  in  [8]  and  [39],  and  brings  several  computational  and  algorithmic  improvements.  The 
above-mentioned  CSSA  algorithm  explicitly  makes  use  of  the  knowledge  of  the  slow  variables  (often  unavailable 
in  many  real  applications),  a  drawback  we  plan  to  address  as  explained  later  in  Section  4,  where,  driven  by  the 
top  eigenvector  of  an  appropriately  constructed  Laplacian,  we  discover  the  underlying  slow  variable.  In  doing 
so,  we  make  use  of  the  ADM  framework  [39]  which  modifies  the  traditional  diffusion  map  approach  to  take  into 
account  the  time-dependence  of  the  data,  i.e. ,  the  time  stamp  of  each  of  the  data  points  under  consideration.  By 
integrating  local  similarities  at  different  scales,  the  ADM  gives  a  global  description  of  the  data  set. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  2  we  provide  a  mathematical  framework  for 
multiscale  modeling  of  stochastic  chemical  reactions  networks  and  detail  the  two  chemical  systems  via  which 
we  use  to  illustrate  our  approach.  In  Section  3  we  introduce  the  ADM-CLE  framework  and  we  highlight  its 
differences  from  the  approaches  which  were  previously  introduced  in  the  literature.  In  Section  4  we  propose  a 
robust  mapping  from  the  observable  space  to  the  “dynamically  meaningful”  inaccessible  space,  that  allows  us  to 
recover  the  hidden  slow  variables.  In  Section  5  we  introduce  a  Markov-based  approach  for  approximating  the 
steady  distribution  of  the  slow  variable,  and  compare  our  results  with  another  recently  proposed  approach.  We 
conclude  with  a  summary  and  discussion  of  future  work  in  Section  6. 
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2.  Problem  formulation.  A  multi-scale  modeling  framework  for  stochastic  chemical  reaction  networks 
can  be  formulated  as  follows.  We  consider  a  well-mixed  system  of  i  chemical  species,  denoted  by  Xi,X2, . . . ,  X( 
that  interact  through  m  reaction  channels  R\,  R2, . . . ,  Rm  in  a  reactor  of  volume  V .  We  denote  the  state  of  the 
system  by  X(t)  =  [X\  (t),X2, . . . ,  A^(i)],  where  Xi{t),  i  =  1,2, ...  ,i  represents  the  number  of  molecules  of  type 
X,  in  the  system  at  time  t.  With  a  slight  abuse  of  notation,  we  interchangeably  use  A?;  to  denote  the  type  i  of 
the  molecule.  In  certain  scenarios,  one  may  assume  that  the  reactions  can  be  classified  as  either  fast  or  slow, 
depending  on  the  time  scale  of  occurrence  [3].  As  expected,  the  fast  reactions  occur  many  times  on  a  timescale 
for  which  the  slow  reactions  occur  with  very  small  probability.  As  defined  in  [3],  the  fast  species  denoted  by  F 
are  those  species  whose  population  gets  changed  by  a  fast  reaction.  Slow  species  (denoted  by  S)  are  not  changed 
by  fast  reactions.  Considering  that  slow  species  are  not  only  species  from  the  set  {X1,X2, . . . ,  Xi},  but  also  their 
functions  which  are  not  changed  by  fast  reactions,  the  components  of  the  fast  and  slow  species  can  be  used  as  a 
basis  for  the  state  space  of  the  system,  whose  dimension  equals  the  number  of  linearly  independent  species. 

For  each  reaction  channel  Rj,  j  =  1,2, ...  ,m,  there  exists  a  corresponding  propensity  function  ctj  =  aj(x), 
such  that  cij  dt  denotes  the  probability  that,  given  X(t)  =  x ,  reaction  Rj  occurs  within  the  infinitesimal  time 
interval  [t,t  +  dt).  We  denote  by  v  the  stochiometrix  matrix  of  size  m  x  i,  with  entry  Vji  denoting  the  change 
in  the  number  of  molecules  of  type  A,;  caused  by  one  occurrence  of  reaction  channel  Rj.  The  continuous  time 
discrete  in  space  Markov  chain  can  be  further  approximated  by  the  CLE  for  a  multivariate  continuous  Markov 
process  [20].  Using  time  step  At,  the  Euler-Maruyama  discretization  of  the  CLE  is  given  by 

m  m 

Xi (i  +  At)  =  Xi ( t )  +  At  (x(t))  +  \lai(x(t))  N0  W  '/At,  for  alH  =  1, 2, . . . , i,  (2.1) 

3  = 1  3= 1 

where  Xi,  with  another  slight  abuse  of  notation,  denotes  a  real- valued  approximation  of  the  number  of  molecules 
of  the  i-th  chemical  species,  i  =  1,2, ...  ,t.  Here,  Nj(t),  j  =  1,2, ... ,  m,  denote  the  set  of  m  independent  normally 
distributed  random  variables  with  zero  mean  and  unit  variance. 

2.1.  Illustrative  example  CS-I.  As  the  first  illustrative  example,  we  consider  the  following  simple  2- 
dimensional  chemical  system,  with  the  two  chemical  species  denoted  by  X\  and  X2  (i.e. ,  i  =  2)  which  are  subject 
to  four  reaction  channels  Rj,  j  =  1,2, 3, 4  (i.e.,  m  =  4),  given  by 


0 


Ad 


(2.2) 


Throughout  the  rest  of  this  paper,  we  shall  refer  to  the  chemical  system  (2.2)  as  CS-I  (i.e.  “chemical  system  I”). 
We  label  by  Rj  the  reaction  corresponding  to  the  reaction  rate  subscript  kj,  j  =  1,2,  3, 4,  and  note  that  each 
reaction  Rj  has  associated  a  propensity  function  a.j(t)  given  by  [21] 


ai(t)  =  k\V,  a2{t)  =  fc2Ai(f),  a3(t)  =  k3X2(t),  a.i(t)  =  hi  X2(t) ,  (2.3) 


where  V  denotes  the  volume  of  the  reactor.  We  consider  the  system  with  the  following  dimensionless  parameters 


k\V  =  100,  k2  =  k3  =  200  and  k±  =  1. 


(2.4) 


We  plot  in  Figure  2.1(a)  the  time  evolution  of  the  two  different  species  in  system  (2.2),  together  with  the  slow 
variable  S  =  (Ai  +  X2)/2,  starting  from  initial  conditions  Ai(0)  =  A2(0)  =  100.  As  the  figure  shows,  the  system 
variables  X\  and  A2  are  changing  very  frequently  (thus  we  label  them  as  fast  variables) ,  while  the  newly  defined 
variable  S  changes  very  infrequently  and  can  be  considered  to  be  a  slow  variable. 

Following  [26],  for  the  chemical  system  in  (2.2)  comprised  only  of  monomolecular  reactions,  it  is  possible  to 
compute  analytically  the  stationary  distribution  of  the  slow  variable  S,  since  the  joint  probability  distribution  of 
the  two  variables  Ai  and  A2  is  a  multivariate  Poisson  distribution 


\"1  \"2 

P(Ai  =  m,  A2  =  n2)  =  ^-7^7  exp(— Ai  -  A2) 
ni!  no! 


with  parameters  given  by 


Ai 


k\V{k3  +  kj) 
k2  Aq 


100.5  and  A2  =  ^  =  100. 

k\ 


(2.5) 


(2.6) 
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Fig.  2.1.  (a)  Trajectories  of  the  CS-I  considered  in  (2.2)  showing  the  behavior  of  the  slow  variable  S  =  ( X\  +X2)/2  in  contrast 

to  the  behavior  of  the  fast  variables  X±  and  X2,  where  the  system  propensity  functions  and  parameters  are  given  by  (2.3)  and  (2.4). 
(b)  Trajectories  of  the  CS-II  considered  in  (2.7),  showing  the  slow  behavior  of  the  variable  S  =  X 1  +  2X2  in  contrast  to  the  fast 
behavior  of  variables  X\  and  X2,  where  the  system  parameters  are  given  by  (2.8). 


2.2.  Illustrative  example  CS-II.  The  second  example  is  taken  from  Ref.  [8].  We  shall  refer  to  it  as  CS-II 
from  now  on.  We  consider  the  following  system 

X27=±X4+X2,  0  X\,  X1+X1^j±X2,  (2.7) 

k2  k4  k6 

involving  two  molecular  species  X4  and  A'2,  whose  reactions  Ri,  R2, . . . ,  Rq  have  the  propensity  functions  given 

by 


(*i(t)  =  k1X2(t),  a2(t)  =  k2X1(t)X2(t)/V,  a3(t)  =  k3V, 

a4  (t)  =  k4X1(t),  a5(t)  =  k5  ^  ;  a6  (t )  =  k6X2  (t) , 

where  V  denotes  the  system  volume.  Figure  2.1(b)  shows  a  simulated  trajectory  of  this  chemical  system  using 
the  Gillespie  algorithm  for  the  following  dimensionless  parameters  [8] 

A:-,  =  32,  k2  =  0.041/  k3V  =  1475,  k4  =  19.75,  k5  =  101/  k6  =  4000,  (2.8) 

where  we  use  V  =  8.  Note  that  in  this  second  example,  reactions  R§  and  Rq  are  occurring  on  a  much  faster 
timescale  than  the  other  four  reactions  R\,  R2 ,  R3  and  R4.  A  natural  choice  for  the  slow  variable  is  S  =  X\+2X2, 
which  is  invariant  with  respect  to  all  fast  reactions  [8],  as  we  illustrate  in  Figure  2.1(b). 

2.3.  Main  problem.  Our  end  goal  in  the  present  paper  is  to  propose  an  algorithm  that  efficiently  and 
accurately  estimates  the  stationary  probability  density  of  the  hidden  slow  variable  S ,  without  any  prior  knowledge 
of  it.  The  approach  we  propose  builds  on  the  anisotropic  diffusion  map  framework  (ADM)  to  implicitly  discover 
the  mapping  from  the  observable  state  space  to  the  dynamically  meaningful  coordinates  of  the  fast  and  slow 
variables,  as  previously  introduced  in  [39],  and  on  the  CLE  approximation  (2.1). 

3.  ADM-CLE  approach.  Let  us  consider  example  CS-II,  and  assume  that  s  =  s( x4,x2)  =  x4  +  2x2 
and  /  =  /( x4,x2)  =  x4  are  the  slowly  and  rapidly  changing  variables,  respectively.  They  together  define 
a  mapping  g  :  (xi,x2)  K >  ( s,  / )  from  the  observable  state  variables  x4  and  x2  in  the  accessible  space  O  to 
the  “dynamically  meaningful”  (but  in  more  complicated  examples  inaccessible)  slow  variable  s  and  the  fast 
accessible  variable  /,  both  in  space  77.  In  other  words,  g  maps  (x4,x2)  i— >  (x4  +  2x2,x4),  and  conversely  its 
inverse  h  :=  g~x  :  (s,  /)  (/,  ^). 

The  approach  introduced  in  [39]  exploits  the  local  point  clouds  generated  by  many  local  bursts  of  simulations 
at  each  point  (xi,^)  hr  the  observable  space  O.  Such  observable  local  point  clouds  are  the  image  under  h  of 
similar  local  point  clouds  in  the  inaccessible  space  77  (at  corresponding  coordinates  (s,  /)  such  that  h(s,  /)  = 
(xi,  x2)),  which,  due  to  the  separation  of  scales  between  the  fast  and  slow  variables  /  and  s,  have  the  appearance 
of  thin  elongated  ellipses.  It  is  precisely  this  separation  of  scales  that  we  leverage  into  building  a  sparse  anisotropic 
graph  Laplacian  L  in  the  observable  space,  and  use  it  as  an  approximation  of  the  isotropic  graph  Laplacian  in 
the  inaccessible  space  77.  As  we  shall  see,  the  top  nontrivial  eigenvector  of  L  will  robustly  indicate  all  pairs  of 
original  states  (xi,  x2)  that  correspond  to  the  same  slow  variable  S  =  s  (where  s  =  x4  +  2x2  for  CS-II).  In  other 
words,  we  discover  on  the  fly  the  structure  of  the  slow  variable  5,  and  further  integrate  this  information  into  a 
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Markov-based  method  for  estimating  its  stationary  distribution  ¥(S  =  s),  while  also  computing  along  the  way  an 
analytical  expression  for  the  conditional  distribution  of  the  fast  variable  given  the  slow  variable  P(F  =  f\S  =  s). 

Singer  et  al.  [39]  run  many  local  bursts  of  simulations  for  a  short  time  step  St  starting  at  ( Xi,X2 )■  Such 
trajectories  end  up  at  random  locations  forming  a  cloud  of  points  in  the  observable  plane  0,  with  a  bivariate 
normal  distribution  with  2x2  covariance  matrix  E.  The  shape  of  the  resulting  point  cloud  is  an  ellipse,  whose 
axes  reflect  the  dynamics  of  the  data  points.  In  other  words,  when  there  is  a  separation  of  scales,  the  ellipses  are 
thin  and  elongated,  with  the  ratio  between  the  axis  of  the  ellipse  given  by  the  ratio 


^2 


(3.1) 


of  the  two  eigenvalues  of  E.  The  first  eigenvector  corresponding  to  Ai  points  in  the  direction  of  the  fast  dynamics 
on  the  line  x\  +  2x2  =  s,  while  the  second  one  points  in  the  direction  of  the  slow  dynamics.  In  particular, 
r  is  a  small  parameter,  i.e.  0  <  r  C  1.  In  general,  we  wish  to  piece  together  locally  defined  components 
into  a  globally  consistent  framework,  a  nontrivial  task  when  the  underlying  unobservable  slow  variables  (or  the 
propensity  functions  of  the  system)  are  complicated  nonlinear  functions  of  the  observable  variables  in  0. 

The  construction  of  the  ADM  framework  in  [39]  relates  the  anisotropic  graph  Laplacian  in  the  observable 
space  0  with  the  isotropic  graph  Laplacian  in  the  inaccessible  space  W.  In  that  setup,  each  of  the  N  data 
points  xW,  i  =  1,2,...,  iV,  lives  in  an  ^-dimensional  data  space.  For  both  CS-I  and  CS-II,  the  data  is  two- 
dimensional,  thus  t  =  2.  For  the  former  system,  we  consider  each  lattice  point  in  the  domain  [50, 150]  x  [50, 150], 
hence  there  are  N  =  1012  =  10,201  states,  while  for  the  latter  one  we  consider  the  domain  [1,110]  x  [1,110], 
i.e.  N  =  1102  =  12,100.  Throughout  the  paper,  we  will  often  refer  to  the  N  data  points  xW  =  ( xi,x2)^, 
i  =  1, 2, . . . ,  N,  as  0-states  of  the  chemical  system.  The  ADM  [39]  then  generates  ensembles  of  short  simulation 
bursts  at  each  of  the  N  points  in  the  data  set,  computes  the  averaged  position  after  statistically  averaging  over 
the  many  simulated  trajectories,  and  obtain  an  estimate  of  the  local  2x2  covariance  matrix  E(q.  For  each  data 
point  xW,  the  inverse  of  E^)  is  computed  and  symmetric  E-dependent  squared  distance  between  pairs  of  data 
points  in  the  two-dimensional  observable  space  R2  (given  by  (3.3)  below)  is  defined.  The  ADM  framework  then 
uses  this  dynamic  distance  measure  to  approximate  the  Laplacian  on  the  underlying  hidden  slow  manifold.  We 
provide  further  details  on  the  anisotropic  diffusion  maps  framework  in  Section  3.2.  We  now  highlight  the  first 
difference  between  the  approach  taken  in  this  paper  and  in  [39]. 

3.1.  Replacing  short  simulation  bursts  by  the  CLE  approximation.  The  local  bursts  of  simulations 
initiated  at  each  data  point  in  order  to  estimate  the  local  covariances  may  be  computationally  expensive  to 
estimate.  In  this  paper,  we  bypass  these  short  bursts  of  simulations  by  using  an  approximation  given  by  the  CLE 
(2.1),  which  allows  for  a  theoretical  derivation  of  the  local  2x2  covariance  matrices.  Using  (2.1),  we  obtain 


Co v(Xi(t  +  At),  Xk(t  +  At))  =  E [Xi{t  +  A t)Xk(t  +  At)]  -  E [X»(i  +  At)]E[Xk(t  +  At)] 


=  A  t^VjiVjkajiX). 
3= 1 


(3.2) 


Computing  the  eigen-decomposition  of  a  local  covariance  matrix  is  analogous  to  performing  the  Principal  Com¬ 
ponent  Analysis  on  the  local  cloud  of  points,  generated  by  the  short  simulations  bursts.  The  advantage  of  (3.2) 
over  the  computational  approach  used  in  [39]  is  that  Em  can  be  computed  at  each  data  point  without  running 
(computationally  intensive)  short  bursts  of  simulations.  The  error  of  the  CLE  approximation  depends  on  the 
values  of  coordinates  of  the  data  point  xW,  i.e.  on  the  system  volume  V  [23,  12].  In  the  case  of  CS-I  or  CS-II,  the 
most  probable  states  contain  about  one  hundred  molecules  of  each  chemical  species  and  the  CLE  approximation 
(3.2)  is  well  justified. 

3.2.  Anisotropic  diffusion  kernels.  The  next  task  is  the  integration  of  all  local  principal  components 
into  a  global  framework,  with  the  purpose  of  identifying  the  hidden  slow  variable.  We  estimate  the  distance 
(and  hence  the  similarity  measure)  between  the  slow  variables  in  the  underlying  inaccessible  manifold  using 
the  anisotropic  graph  Laplacian  [39].  We  derive  a  symmetrized  second  order  approximation  of  the  (unknown) 
distances  in  the  inaccessible  space  "H,  based  on  the  Jacobian  of  the  unknown  mapping  from  the  inaccessible  to 
the  observable  space.  The  E-dependent  distance  between  two  0-states  is  given  by 


d%  ([x1,x2)M,  (zi,ie2)(j))  =  ^  ((x1,x2){z)  -  (aq,  rr2)(j)) 


i-l 

l(x1,x2)(i'> 


—  1 


Xl,x2)^  -  (xi,X2 
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(3.3) 


and  represents  a  second  order  approximation  of  the  Euclidean  distance  in  the  inaccessible  (s,  r/)- space 


d|[(aq,a;2)w,  (aq.aq)^]  «  (sw  -  sW)2  +  r2(/w  -  /(j))2 


(SW  -s(i))2, 


(3.4) 


where  the  last  approximation  is  due  to  the  fact  that  r  is  a  small  parameter,  see  (3.1).  Note  that  it  is  also 
possible  to  extend  (3.4)  to  higher  dimensions,  as  long  as  there  exists  a  separation  of  scales  between  the  set  of  slow 
variables  and  the  set  of  fast  variables  [39].  Using  approximation  (3.3)-(3.4)  of  the  distance  between  states  of  the 
slow  variable,  we  next  construct  (an  approximation  of)  the  Laplacian  on  the  underlying  hidden  slow  manifold, 
using  the  Gaussian  kernel  as  a  similarity  measure  between  the  slow  variable  states.  We  build  an  N  x  N  similarity 
matrix  W  with  entries 


Wij  =  exp 


-^|[(zi,Z2)(i),  {x1,x2)(j)] 

c-2 


exp 


-(«(<)  -sW)2} 

£2  y 


=  (3.5) 


where  the  single  smoothing  parameter  e  (the  kernel  scale)  has  a  two-fold  interpretation.  On  one  hand,  e  denotes 
the  squared  radius  of  the  neighborhood  used  to  infer  local  geometric  information,  in  particular  Wij  is  0(1)  when 
s W  and  sU)  are  in  a  ball  of  radius  y/e,  thus  close  on  the  underlying  slow  manifold,  but  it  is  exponentially  small 
for  states  that  are  more  than  yfe  apart.  On  the  other  hand,  e  represents  the  discrete  time  step  at  which  the 
underlying  random  walker  jumps  from  one  point  to  another.  We  refer  the  reader  to  [31]  for  a  detailed  survey 
of  random  walks  on  graphs,  and  their  applications.  We  normalize  W  using  the  diagonal  matrix  D  to  define  the 
row-stochastic  matrix  L  by 


N 

Dii  =  J2Wv>  L  =  D~XW.  (3.6) 

l=i 

Since  L  is  a  row-stochastic  matrix,  it  has  eigenvalue  Ao  =  1  with  trivial  eigenvector  4>o  =  (1,1,...,1)T.  The 
remaining  eigenvalues  can  be  ordered  as 

1  =  Ao  >  Ai  >  A2  >  . . .  >  Ajv-i- 

We  denote  by  4>j  the  corresponding  eigenvectors,  i.e.  L4>;  =  A j<&j.  The  top  d  nontrivial  eigenvectors  of  the 
random- walk  anisotropic  Laplacian  L  describe  the  geometry  of  the  underlying  d-dimensional  manifold  [16],  i.e. 
the  i-th  data  point  is  represented  by  the  following  diffusion  map 

(Si  (*),$2(*),  ■■•.$«*(*)),  i  =  l,2,...,7V,  (3.7) 

where  4 ?j(i)  denotes  the  «-th  component  of  the  eigenvector  4>j.  However,  note  that  some  of  the  considered 
eigenvectors  can  be  higher  harmonics  of  the  same  principal  direction  along  the  manifold,  thus  in  practice  one 
computes  the  correlation  between  the  computed  eigenvectors  before  selecting  the  above  d  eigenvectors  chosen  to 
parametrize  the  underlying  manifold.  For  the  two  chemical  systems  considered  in  this  paper,  we  show  in  the 
remainder  of  this  section  how  the  top  (i.e.,  d  =  1)  non- trivial  eigenvector  of  L  can  be  used  to  successfully  recover 
the  underlying  slow  variable. 

Using  the  stochasticity  of  L,  we  can  interpret  it  as  a  random  walk  matrix  on  the  weighted  graph  G  =  (U,  E), 
where  the  set  of  nodes  corresponds  to  the  original  observable  states  (aq,  x2)^\  i  =  1,  2, . . . ,  N  (and  implicitly  to 
states  of  the  slow  variable),  and  there  is  an  edge  between  nodes  i  and  j  if  and  only  if  IT,,  >  0.  The  associated 
combinatorial  Laplacian  is  given  by  L  =  D  —  W .  Whenever  the  pair  (A,,  4>j)  is  an  eigenvalue-eigenvector  solution 
to  L&i  =  A t&i,  then  so  is  (1  —  A ,,  4?,;)  for  the  generalized  eigenvalue  problem  =  A i-D4q.  We  plot  in  Figures 
3.1(a)  and  3.2(a)  the  spectrum  of  the  combinatorial  Laplacian  L  =  D  —  W .  for  the  chemical  systems  CS-I  and 
CS-II.  In  Figures  3.1(b)  and  3.2(b)  we  color  the  states  of  the  network  with  the  top  non-trivial  eigenvector  4>i. 

Before  considering  the  top  eigenvector  of  L  for  determining  the  underlying  slow  variable  and  estimating  its 
stationary  distribution,  we  propose  to  use  a  sparse  graph  Laplacian  which  differs  from  the  ADM  method  in  [39] , 
where  the  Laplacian  matrix  is  associated  to  a  complete  weighted  graph.  However,  using  a  complete  graph  leads 
to  computing  the  E-dependent  squared  distance  in  equation  (3.3)  for  any  pair  of  nodes,  thus  an  0(N2)  number 
of  computations  is  used.  In  light  of  the  approximation  (3.4),  a  pair  of  points  which  are  far  away  in  the  observable 
space  (i.e.,  for  which  d^((xi,  x2)W,  (aq,  a;2)^4)  is  large)  denotes  a  pair  of  corresponding  states  of  the  slow  variable 
which  are  also  far  away  in  the  inaccessible  space.  Thus  we  do  not  have  to  do  such  computations,  because  points 
far  away  in  the  unobservable  space  will  have  an  exponentially  small  similarity  Wij  close  to  0.  The  fact  that  the 
shape  of  the  local  point  cloud  is  an  ellipse  provides  some  insight  in  this  direction.  Thus  we  will  build  a  sparse 
graph  of  pairwise  measurements  and  no  longer  compute  the  E-dependent  distance  between  all  points  of  the  data 
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(b) 
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Weighted  degree 


(d) 


Fig.  3.1.  Illustrative  Example  CS-I.  (a)  The  top  500  eigenvalues  of  the  associated  combinatorial  Laplacian,  i.e.  (1  —  A i)  for 
i  =  1,2,...,  500.  (b)  The  coloring  of  the  nodes  of  G  (states  of  the  observable  space)  according  to  their  corresponding  entry  in  the 
top  eigenvector  of  L  given  by  (3.6).  (c)  The  weighted  degree  distribution  of  the  ground  state  graph  G.  (d)  A  scatterplot  of  the 
states  of  the  system,  colored  by  their  weighted  degree. 


(a) 


(b)  (c)  (d) 
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1  weighted  degree  i 


Fig.  3.2.  Illustrative  Example  CS-II.  (a)  The  top  500  eigenvalues  of  the  associated  combinatorial  Laplacian,  i.e.  (1  —  A^)  for 
i  =  1,2,...,  500.  (b)  The  coloring  of  the  nodes  of  G  (states  of  the  observable  space)  according  to  their  corresponding  entry  in  the 
top  eigenvector  of  L  given  by  (3.6).  (c)  The  weighted  degree  distribution  of  the  ground  state  graph  G.  (d)  A  scatterplot  of  the 
states  of  the  system,  colored  by  their  weighted  degree. 


set,  but  only  between  a  very  small  subset  of  the  points.  The  spectrum  of  the  covariance  matrix  E in  particular 
the  ratio  r  of  its  two  eigenvalues  given  by  (3.1),  guides  us  in  building  locally  at  each  point,  a  sparse  ellipsoid-like 
neighborhood  graph. 

For  each  observable  state  {xx,x2)<'l\  we  build  a  local  adjacency  graph,  denoted  by  Gi:  in  the  shape  of  an 
ellipse  pointing  in  the  direction  of  the  fast  dynamics,  and  whose  small  axis  points  in  the  direction  of  the  slow 
dynamics.  Figure  3.3  shows  an  example  of  such  a  local  1-hop  neighborhood  graph  Gi ,  where  the  central  node 
(xi,x2)('l'>  is  connected  to  all  points  contained  within  the  boundaries  of  an  appropriately  scaled  ellipse  centered 
at  (iEi,£2)b).  Finally,  we  define  the  sparse  graph  G  of  size  N  x  N  associated  to  the  entire  network  as  the  union  of 
all  locally  defined  ellipsoid- like  neighborhood  graphs  G  =  (Ji=1  Note  that  the  union  graph  G  is  still  a  simple 
graph,  with  no  self  edges  and  no  multiple  edges  connecting  the  same  pair  of  nodes.  We  compute  the  distance 
ds  by  (3.3)  (and  thus  the  similarity  Wj,-)  between  a  pair  of  nodes  (xi,x2)^  and  (xi,x2)^  if  and  only  if  the 
corresponding  edge  (i.  j)  exists  in  G. 

We  plot  in  Figures  3.1(c)  and  3.2(c)  the  histogram  of  the  weighted  degrees  of  the  nodes  in  the  weighted 
graph  W  defined  in  (3.5),  while  Figures  3.1(d)  and  3.2(d)  show  a  scatterplot  of  the  states  of  the  system,  where 
each  state  i  is  colored  by  its  weighted  degree,  i.e.,  the  sum  of  all  its  outgoing  weighted  edges  Wij,j  =  1,2 , . . . ,  n. 
Throughout  the  computational  examples  in  this  paper,  the  smoothing  parameter  e  which  appears  in  (3.5)  was 
set  to  e  =  0.1.  In  contrast  to  the  approach  in  [39]  which  computes  all  0{N 2)  pairwise  similarities,  the  sparsity 
of  G  (and  thus  of  the  associated  graph  Laplacian  L )  in  our  approach  only  requires  the  computation  of  a  much 
smaller  number  of  distances,  as  low  as  linear,  depending  on  the  discretization  of  the  domain,  and  makes  it 
computationally  feasible  to  solve  problems  with  thousands  or  even  tens  of  thousands  of  nodes. 

4.  A  robust  mapping  from  the  observable  space  O  to  the  “dynamically  meaningful”  inaccessible 
space  H.  As  a  first  step  towards  partitioning  the  nodes  of  the  original  graph  G  and  detecting  the  associated 
slow  variable,  we  sort  the  entries  of  the  top  eigenvector  which  we  then  denote  by  with  $1(1)  >  $i(2)  > 

. ..  >  >I>i(A).  This  sorting  process  defines  permutation  a  of  the  original  index  set  i  =  1,2,  ...,7V  so  that 
$i(cr(i))  =  <f>i(z).  We  consider  the  increments  between  two  consecutive  (sorted)  values 

Si  =  $1  (*)  -  $i(z  +  1), 
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i  =  1,2,...,  N-  1. 


(4.1) 


Fig.  3.3.  The  local  neighborhood  graph  Gi  at  a  given  node  the  shape  is  an  ellipsoid  whose  axis  ratio  is  given  by  the 

ratio  of  the  eigenvalues  r  of  the  local  covariance  matrix  d?^((x\,  X2)^\  (x±t  #2)^),  i.e.  by  (3.1).  The  corresponding  eigenvectors 
are  used  to  calculate  the  orientation  of  the  ellipse. 


Fig.  4.1.  Illustrative  example  CS-I.  (a)  Jump  sizes  (4.1)  of  the  sorted  eigenvector  3>i  of  the  sparse  anisotropic  graph  Laplacian 
L.  (b)  The  correlation  of  &  1  with  the  ground  truth  slow  variable  s  =  (x±  -\-X2)/‘2.  (c)  Zoom-in  on  the  sorted  top  eigenvector  4?i 
(the  colors  denote  the  corresponding  slow  variable)  showing  that  is  almost  piece-wise  constant  on  the  bins  that  correspond  to 
distinct  slow  variable  states.  The  kernel  scale  is  set  to  e  =  0.1. 


Next,  we  sort  the  vector  of  such  increments,  denote  its  entries  by  8-\  >  <52  >  and  show  in  Figure 

4.1(a)  (resp.  Figure  4.2(a))  the  top  300  (resp.  top  420)  largest  such  increments  Si  for  illustrative  example 
CS-I  (resp.  CS-II).  Note  that  this  already  give  us  an  idea  about  the  number  of  distinct  slow  states  in  the 
system,  a  set  which  we  denote  by  S.  Ideally,  the  difference  <I>i(*)  —  $i(j)  in  the  entries  of  the  top  eigenvector 
corresponding  to  two  observable  states  (xi,a:2)W  and  {x\ ,  x^)1'3'1  that  belong  to  the  same  slow  variable  s  (i.e., 
xf  +  2x1  f  =  xf  +  2x};j  =  s  for  illustrative  example  CS-II)  should  be  zero  or  close  to  zero,  in  which  case 
we  expect  that  only  approximately  |<S|  of  the  N  —  1  increments  <5,;  are  significantly  larger  than  zero,  while  the 
remaining  majority  are  zero  or  close  to  zero. 

In  Figures  4.1(b)  and  4.2(b)  we  highlight  the  correlation  between  the  entries  of  the  top  non-trivial  eigenvector 
$1  and  the  corresponding  slow  variable  S.  In  Figures  4.1(c)  and  4.2(c),  we  zoom  on  a  subset  of  states  to  make  the 
point  that  the  eigenvector  <I>i  is  almost  constant  on  the  0-states  that  correspond  to  the  same  value  of  the  slow 
variable.  The  plots  in  Figures  3.1(b)  and  3.2(b)  show  a  coloring  of  the  networks  generated  by  the  two  chemical 
systems  CS-I  and  CS-II,  based  on  the  first  nontrivial  eigenvector  of  the  associated  sparse  Laplacian  L.  Note  that 
the  eigenvector  looks  almost  piecewise  constant  along  the  lines  that  point  to  the  evolution  of  the  fast  variable, 
for  a  given  value  of  the  slow  variable  ( S  =  (Xi  +  X2)/2  for  CS-I  and  S  =  X\  +  2X2  for  CS-II),  yet  nowhere 
along  the  way  we  have  input  this  information  into  the  method.  In  the  next  step  we  use  this  top  eigenvector 
to  identify  all  nodes  of  the  graph  (original  states  of  the  chemical  system)  that  correspond  to  the  same  value  of 
the  underlying  slow  variable.  In  other  words,  all  nodes  whose  corresponding  eigenvector  entries  are  between  an 
appropriately  chosen  interval  (that  we  shall  refer  to  a  bin)  will  be  labeled  as  belonging  to  the  same  slow  variable 
S.  In  other  words,  we  seek  a  partition  of  the  observable  states  in  0,  i.e.,  of  the  nodes  of  G,  such  that  all  original 
states  ( X\,X2 )^  with  the  same  value  of  the  corresponding  slow  variable  s((xi,  x2)W)  end  up  in  the  same  bin. 
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Fig.  4.2.  Illustrative  example  CS-II.  (a)  Jump  sizes  (4.1)  of  the  sorted  eigenvector  3>i  of  the  sparse  anisotropic  graph  Laplacian 
L.  (b)  The  correlation  of  with  the  ground  truth  slow  variable  s  =  x\  +  2x2  •  (c)  Zoom-in  on  the  sorted  top  eigenvector  4>i  (the 
colors  denote  the  corresponding  slow  variable)  showing  that  «i»i  is  almost  piece-wise  constant  on  the  bins  that  correspond  to  distinct 
slow  variable  states.  The  kernel  scale  is  set  to  e  =  0.1. 


Our  goal  is  to  find  a  partition  V  =  {V\,  V2,  ■  ■■  ,Vt t}  of  O  such  that 


k 

Vj  =  {(xi,x2)w  £  O  \  s(x!,x2)  =  qj}  and  (J  Vj=Q,  (4.2) 

3= 1 

where  k  denotes  the  number  of  distinct  values  qj ,  j  =  1,2, ...  ,k,  of  the  slow  variable  S.  As  an  example,  in  the 
case  of  CS-I  given  by  (2.2),  the  partition  Vj  =  {(1, 99),  (2,  98), . . . ,  (99, 1)}  corresponds  to  all  nodes  in  the  graph 
for  which  the  value  of  the  associated  slow  variable  is  constant  qj  =  50.  The  key  observation  we  exploit  here  is 
that  the  top  eigenvector  of  the  Laplacian  matrix  is  almost  piecewise  constant  on  the  bins  that  partition  O,  since 
the  nodes  of  G  that  correspond  to  the  same  value  of  the  slow  variable  have  a  very  high  pairwise  similarity,  with 
Wij  very  close  to  1. 

One  may  also  interpret  the  above  problem  as  a  clustering  problem,  where  the  similarity  between  pairs 
of  points  is  given  by  (3.5),  and  is  such  that  nodes  that  belong  to  the  same  bin  have  a  much  higher  similarity 
compared  to  nodes  that  belong  to  two  different  bins,  an  effect  due  to  the  strong  separation  of  scales.  In  the  case  of 
illustrative  example  CS-I,  the  clusters  correspond  to  lines  in  the  two-dimensional  plane  such  that  ( aq  +  x2)/2  =  c, 
for  a  constant  c.  We  point  out  the  interested  reader  to  the  work  of  [32],  where  the  top  eigenvectors  of  the 
random  walk  Laplacian  are  used  for  clustering.  While  in  practice  one  uses  the  top  several  eigenvectors  as  the 
reduced  eigenspace  where  clustering  is  subsequently  performed,  in  our  case  the  top  eigenvector  alone  suffices  to 
capture  the  many  different  clusters  (i.e. ,  bins),  a  fact  we  attribute  to  the  strong  separation  of  scales  exhibited 
by  the  illustrative  chemical  systems  CS-I  and  CS-II.  If  several  eigenvectors  were  considered  then  one  could  use  a 
clustering  algorithm,  such  as  k-means  or  spectral  clustering  [38,  45],  to  obtain  the  partitioning  (4.2).  However, 
a  simpler  method  has  been  successfully  used  for  the  1-dimensional  eigenspace  in  both  examples  we  considered. 
It  is  described  as  follows.  Recall  the  sorted  vector  of  increments  £1  >  S2  >  . . .  >  6n-i  defined  in  equation  (4.1), 
and  consider  the  set  of  the  k  —  1  largest  such  increments  {<5i ,  S2, . . . ,  where  <5i  >  S2  >  . . .  >  8k- 1-  Next, 

from  the  sorted  eigenvector  $1  we  extract  the  position  of  the  entries  whose  associated  increment  (with  respect 
to  its  right-next  neighbor  index)  belongs  to  {<5i ,  52, . . . ,  5fc_i}.  In  others  words,  we  compute 

bt  =  arg  $i(z)  —  l>i(i  +  1)  =  8t,  where  t  =  1, 2, . . . ,  k  —  1,  (4.3) 

i=l,2,...,AT-l 

and  60  =  0  and  bk  =  N.  Finally,  we  compute  an  estimated  partition  V  of  O  by  Vq  =  {i  £  O  \  cr(i)  £  (bq-i,bq]}, 
where  q  =  1,2 , ...  ,k,  and  a  is  the  permutation  of  the  original  index  set  i  =  1,2,...,  iV,  given  in  the  definition  of 
4>i,  i.e.  $i(ct(*))  =  4>i(i). 

To  illustrate  the  correctness  of  our  proposed  technique,  we  compute  the  Jaccard  index  between  each  proposed 
partition  set  Vj,  j  =  1,2, ...  ,k  and  each  ground  truth  partition  set  Vi,  i  =  1,  2, . . . ,  |S|: 

Jij  =  0  ^  | )  where  i  =  1, 2, . . . ,  |<S|,  j  =  l,2,...,k,  (4.4) 

ImU  Pj  | 

and  show  a  heatmap  of  this  matrix  in  Figure  4.3(b).  Since  we  are  interested  not  only  in  the  partition,  but  also  in 
recovering  the  ordering  of  the  slow  variable,  we  show  in  Figure  4.3(c)  the  correlation  between  the  ground  truth 
ordering  of  the  slow  variable  and  our  recovered  ordering.  Note  that  we  can  only  recover  the  ordering  up  to  a 
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Fig.  4.3.  Illustrative  example  CS-I.  (a)  The  eigenvector-based  slow  variable  cardinality.  The  Theta  score  ©  is  the  smoothness 
measure  of  the  bin  cardinalities,  defined  in  (4.5).  The  algorithm  perfectly  recovers  the  ground  truth  partition,  (b)  The  heatmap  of 
the  pairwise  Jaccard  similarity  matrix  given  by  (4.4).  (c)  The  correlation  between  the  ordering  of  the  ground  truth  slow  variable  and 
the  eigenvector  recovered  slow  variable,  (d)  The  Jaccard  index  of  the  pairwise  matched  bins  (from  the  maximum  matching). 


(a) 


(b) 


Cardinality  true  slow  var.  |<S|  =  328,  ©  =  108 


Cardinality  eig  slow  var.  |<S|  =  328,  ©  =  7206 


(c) 


Cardinality  eig  slow  var.  151  =  328,  ©  =  7206 


Slow  variable  (zoom-in) 


Fig.  4.4.  Illustrative  example  CS-II.  (a)  The  ground  truth  slow  variable  cardinality,  (b)  The  eigenvector-based  slow  variable 
cardinality.  0  captures  the  smoothness  of  the  bin  cardinalities,  as  introduced  in  (4.5).  (c)  Plot  of  the  cardinalities  of  a  subset  of 
bins,  showing  the  erroneous  bin  assignments  in  the  eigenvector-based  partition.  This  is  a  zoomed-in  version  of  panel  (b). 


global  sign,  since  —  ’I'i  is  also  an  eigenvector  of  L.  Finally,  we  compute  the  maximum  weight  matching  (using,  for 
example,  the  Hungarian  method  [29])  in  the  bipartite  graph  with  node  set  V  U  V  and  edges  across  the  two  sets 
given  by  matrix  J  in  (4.4).  In  Figure  4.3(d)  we  plot  the  Jaccard  index  of  the  matched  partitions.  For  the  first 
chemical  system  CS-I,  note  that  the  algorithm  perfectly  recovers  the  ground  truth  partition.  In  Figure  4.4  we 
present  the  outcome  of  the  binning  algorithm  for  the  illustrative  example  CS-II,  which  is  no  longer  satisfactorily 
by  itself  and  requires  further  improvement.  Though  the  bin  cardinalities  in  the  initial  solution  visually  resemble 
the  ground  truth,  there  are  numerous  mistakes  being  made.  To  illustrate  this,  for  a  given  partition  V ,  we  compute 
the  following  measure  of  continuity  of  the  recovered  bin  cardinalities 

S- 1 

Qv  =  ©(Pr,  ...,VS)  =  -  \Vi+1\)2.  (4.5) 

»= i 

In  other  words,  Q-p  captures  the  squared  difference  in  the  cardinalities  of  two  consecutive  bins.  For  the  chemical 
system  CS-II,  the  ground  truth  yields  a  score  0  =  108,  while  for  the  eigenvector-recovered  solution  0  =  7206, 
thus  indicating  already  that  numerous  misclassifications  are  being  made,  without  even  computing  the  Jaccard 
similarity  matrix  (4.4)  between  the  two  partitions.  To  this  end,  we  introduce  in  the  next  subsection  a  heuristic 
denoising  technique  followed  by  a  truncation  of  the  domain,  which  altogether  lead  to  a  better  partitioning  of  O 
into  groups  of  states  that  correspond  to  the  same  slow  variable. 

4.1.  A  bin  denoising  scheme.  While  the  eigenvector-based  partition  procedure  detailed  above  yields 
accurate  results  for  the  CS-I  in  (2.2),  this  procedure  alone  is  not  sufficient  for  obtaining  a  satisfactory  partition 
for  the  more  complex  CS-II  considered  in  (2.7),  as  illustrated  by  the  high  corresponding  0-score  (0  =  7206) 
shown  in  Figure  4.4(b).  In  Figure  4.4(c)  we  zoom  into  some  of  the  recovered  bins,  showing  that  the  eigenvector- 
based  reconstruction  splits  some  of  the  inner  bins,  which  explains  the  high  associated  0-score.  In  other  words, 
states/bins  which  in  the  ground  truth  solution  correspond  to  the  same  values  of  the  slow  state  variable,  are 
divided  into  two  adjacent  bins,  and  mistaken  for  two  distinct  states  of  the  slow  variable. 

To  solve  this  issue,  we  propose  a  bin-denoising  heuristic  that  robustly  assigns  data  points  to  their  respective 
bins.  In  hindsight,  the  continuity  of  the  eigenfunctions  of  the  Laplacian  should  be  reflected  in  the  continuity 
of  the  histogram  of  state  counts  in  bins  corresponding  to  adjacent  intervals.  We  detail  in  Algorithm  1  an 
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Algorithm  1  Bin-merging  algorithm: 


1 

Initialize  FLAG  =  TRUE 

2 

while  FLAG  is  TRUE  do 

3 

Compute  a,  :=  0  ("Pi,  ...,P,U  Vi+1, . . .  V\S\)  , 

Vi  =  1, 2, . . .  |<S  —  1  using  definition  (4.5) 

4 

if  min  a*  <  0(Pi, . . . ,  V\s\)  then 

z=l,2,...,|«S|  — 1 

5 

q  =  argmin  a* 

t=l,2,...,|5|-l 

6 

V  :=V1,...,VqU  Vq+i, .  ■  .V\s\ 

7 

\S\  =  \S\-l 

8 

else 

9 

FLAG  =  FALSE 

10 

end  if 

11 

end  while 

(d) 

Max  matching  in  the  Jaccard  index  matrix 
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Fig.  4.5.  Illustrative  example  CS-II.  (a)  The  eigenvector-based  slow  variable  cardinality  after  truncating  and  bin  denoising. 
The  Theta  score  ©  is  the  smoothness  measure  of  the  bin  cardinalities,  defined  in  (4.5).  (b)  The  heatmap  of  the  pairwise  Jaccard 
similarity  matrix  given  by  (4.4).  (c)  The  correlation  between  the  ordering  of  the  ground  truth  slow  variable  and  the  eigenvector 
recovered  slow  variable,  (d)  The  Jaccard  index  of  the  pairwise  matched  bins  (from  the  maximum  matching). 


iterative  heuristic  procedure  which,  at  each  step,  merges  two  adjacent  bins  such  that  the  resulting  0-score  is 
minimized  across  all  possible  pairs  of  adjacent  bins  that  can  be  merged.  We  show  in  Figure  4.5(a)  the  resulting 
bin  cardinalities  after  the  bin-merging  heuristic  and  after  truncating  at  the  boundary  of  the  slow  variable.  Note 
that  the  new  denoised  partition  yields  0  =  103,  and  the  number  of  bins  (states  of  the  slow  variable)  decreases 
from  |«S|  =  328  to  |<S|  =  314.  Furthermore,  in  Figure  4.5(b)  we  compute  the  Jaccard  similarity  matrix  between 
the  ground  truth  and  the  newly  obtained  partition,  showing  in  Figure  4.5(d)  that  we  almost  perfectly  recover 
the  structure  of  the  ground  truth  bins. 

5.  A  Markov  approach  for  computing  the  steady  distribution  of  the  slow  variable.  In  this  section, 
we  focus  on  the  final  step  of  the  ADM-CLE  approach,  of  estimating  the  stationary  distribution  of  the  slow  variable, 
without  any  prior  knowledge  of  what  the  slow  variable  actually  is.  One  of  the  ingredients  needed  along  the  way 
is  an  estimation  of  the  conditional  distribution  P(FjS  =  s )  of  the  fast  variable  F  given  a  value  s  of  the  slow 
variable  S ,  which  we  compute  via  two  approaches.  As  the  first  approach,  we  consider  the  Conditional  Stochastic 
Simulation  Algorithm  (CSSA)  [8]  which  is  given  in  Algorithm  2.  It  samples  from  the  distribution  of  the  fast 
variable  conditioned  on  the  slow  variable.  The  second  approach  is  entirely  analytic  and  free  of  any  stochastic 
simulations,  and  amounts  to  analytically  solving  the  CME  for  each  set  in  the  partition  V  =  {'Pi,  ■  ■  ■ ,  P&}.  We 

then  compare  our  results  to  the  Constrained  Multiscale  Algorithm  (CMA)  introduced  in  [8] ,  which  approximates 
the  effective  dynamics  of  the  slow  variable  as  a  SDE,  after  estimating  the  effective  drift  and  diffusion  using  the 
CSSA  (Algorithm  2). 

5.1.  A  stochastic  simulation  algorithm  for  estimating  the  conditional  probability  (CSSA).  Our 

next  task  is  to  estimate  the  conditional  distribution  P(E|S'  =  s)  of  the  fast  variable  F  given  a  value  s  of  the  slow 
variable  S.  One  possible  approach  for  doing  this  relies  on  the  CSSA  algorithm  to  globally  integrate  the  effective 
dynamics  of  the  slow  variable.  One  iteration  of  the  CSSA  is  given  in  Algorithm  2.  Ideally,  one  repeats  steps 
1-6  of  Algorithm  2  and  samples  values  of  F  until  the  distribution  P(F|5  =  s)  converges.  In  practice,  we  run 
Algorithm  2  until  Lc  changes  of  the  slow  variable  S  occur.  This  computation  is  done  for  each  value  in  the  range 
of  the  slow  variable  S  =  {si,  S2, . . . ,  S|s|}- 

5.2.  An  analytical  derivation  of  the  conditional  distribution.  An  alternative  approach  which  we  fol¬ 
low  in  this  paper  relies  on  an  analytical  computation  of  the  conditional  distribution  P(E|5'  =  s),  thus  eliminating 
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Algorithm  2  One  iteration  of  the  CSS  A  for  computing  the  conditional  distribution  P(F|S'  =  s)  of  the  fast 
variable  F  given  a  value  s  of  the  slow  variable  S 

m 

1:  Compute  the  propensity  functions  a.i(t ),  for  i  =  1,2 , ,m,  and  their  sum  ao(t)  =  E  otiit). 

2:  Generate  ri  and  r2,  two  uniformly  distributed  random  numbers  in  (0, 1).  *=i 

3:  Compute  the  next  reaction  time  as  t  +  t  where  r  =  —  log(rr)/ao(t). 

4:  Use  r2  to  select  reaction  Rj  which  occurs  at  time  t  +  t 

(each  reaction  Ri ,  i  =  1, 2, . . . ,  m  occurs  with  probability  cti/ao). 

5:  If  the  slow  variable  S  changes  its  current  state  from  s  to  s'  7^  s  due  to  reaction  Rj  occurring,  reset  S  =  s  to 
its  previous  value,  while  not  changing  the  value  of  the  fast  variable  F. 

6:  If  any  of  the  variables  X,  goes  outside  the  boundary  of  the  considered  domain,  then  revert  to  the  state  of 
the  system  in  Step  4  before  reaction  Rj  occurred. 
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Table  5.1 


Illustrative  example  CS-II.  The  set  of  all  ground  states  of  the  system  (£1,2:2)  corresponding  to  the  slow  variable  S  =  X ]  2 X2  = 

7.  We  denote  by  (x\ ,  x’2  )  the  states  reachable  from  (2:1, 2:2)  in  one  transition  step,  and  by  S'  the  associated  corresponding  slow 
variable  such  that  S'  =  X[  +  2X'2.  Rj  denotes  the  reaction  channel  that  takes  the  chemical  system  from  state  (xi,  2:2)  to  (x\ ,  x2), 
with  corresponding  propensity  aj.  We  highlight  in  bold  letters  the  subset  of  all  states  via  which  the  system  can  transition  in  one 
step  from  the  slow  variable  S  =  7  to  S  =  8. 


the  need  for  any  expensive  stochastic  simulations.  We  illustrate  in  Figure  5.1  the  transition  diagrams  for  the  two 
chemical  systems  we  consider  in  this  paper.  For  chemical  system  CS-II,  the  system  can  transition  from  a  given 
state  (2+,  X2)  to  four  adjacent  distinct  0-states:  to  (x\  —  2, 2:2  +  1)  via  channel  Re,  to  (x\  +  2, 2:2  —  1)  via  channel 
Re,  to  ( X\  —  1,2:2)  via  channels  R2  and  R4,  and  finally  to  state  (or  +  1,2:2)  via  channels  R±  and  R3.  However, 
in  terms  of  the  underlying  slow  variable  S,  the  system  can  transition  to  only  two  adjacent  states  S  =  s  —  1  (via 
channels  R2  and  R4)  and  S  =  s  + 1  (via  channels  R±  and  R3),  or  remain  at  the  current  state  S  =  s,  via  channels 
Re  and  Re-  Considering  the  subsystem  of  fast  reactions  of  CS-II  and  conditioning  on  the  line  s  =  Xi  +  2x2,  the 
stationary  CME  takes  the  form 

0  =  ke{fx\  +  2)(xi  +  1)  P(Xl  =2:1+2,  A2  =  2:2  —  1)  +  ke{  X2  +  1)  P(-Xi  =2:1  —  2,  A2  =  2:2  +  1) 

-  (fc52;i(2;i  -  1)  +  keX2)^{Xi  =  Xj,X2  =  2:2). 

Thus,  the  conditional  distribution  for  CS-II  is,  for  0  <  ar  <  s,  given  by 

C  /  ke\X2  C  f  ke\^s~Xl^2 

P(F  =  x1|S  =  »)=  — =Il!((a_Il)/2)!  (tj  ■  if  [s  —  x,)  is  an  even  number. 
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(a)  s 

(x-l,y+l) 


(b)  S 

(x-2,y+l) 


(x+2,y-l) 

s 


Fig.  5.1.  Transition  diagrams  for  the  two  chemical  systems:  (a)  CS-I;  and  (b)  CS-II. 
Pa-1  P s  Ps+1 

5-1 


S=s- 1 


0 


0 


0 


5+1 


Fig.  5.2.  The  Markov  chain  on  the  slow  variable  state  space,  using  the  aggregated  transition  probabilities  (5.1) -(5.2)  for  the 
chemical  system  CS-II. 


Here,  C  is  the  normalization  constants  and  P (F  =  x±\S  =  s)  =  0  if  (s  —  x±)  is  an  odd  number. 

A  similar  argument  can  be  used  for  CS-I.  The  stationary  CME  of  the  fast  subsystem  of  CS-I  is  written  as 


0  —  &2  (#1  +  1)  P(Xi  —  X\  +  1,  X2  —  %2  —  1)  T  (#2  +  1)  P(Ah  —  X\  —  1,  X2  —  X2  1) 
-  {k2Xi  +  k3x2)  P(A'i  =X\,X  =  x2). 


where  s  =  (x\  +  x2)/2.  Thus,  the  conditional  distribution  for  CS-I  is,  for  0  <  X\  <  2s,  given  by 


P(F  =  xijS1  =  s ) 


C  fkA*2 
X\\x2l  \k3J 


C 

X\\(2s  —  Xi)\ 


where  C  is  again  the  normalization  constant. 

5.3.  Aggregated  transition  rates  and  a  Markov  Chain  on  the  state  of  slow  variables.  In  the  final 
step  of  the  ADM-CLE  approach,  we  set  up  a  Markov  chain  on  the  state  space  of  slow  variables  with  the  end 
goal  of  estimating  the  stationary  distribution  of  the  slow  variable.  As  illustrated  in  Figure  5.1(b),  the  system 
CS-II  can  can  transition  from  a  given  state  S'  =  s  to  two  adjacent  states  S  =  s  —  1  (via  reaction  channels  R2 
and  R4)  and  S  =  s  +  1  (via  channels  R\  and  S3),  or  it  can  remain  at  the  current  state  S  =  s,  via  channels  S5 
and  Rq.  Consider  now  the  set  Vs  =  |xb)  =  {x\,x2)^\xi  +  2x2  =  s},  illustrated  as  the  middle  bin  in  Figure  5.2. 
To  compute  the  transition  rate  between  two  adjacent  bins  Vs  and  Vs+i,  one  has  to  aggregate  over  possible  ways 

(s) 

of  getting  from  an  observable  state  in  bin  Vs  to  an  observable  state  in  bin  Vs+i-  We  compute  0)  to  be  the 
aggregated  transition  rate  from  state  Vs  to  state  Vs+i,  over  all  possible  states  (xW,x^),  such  that  x^)  g  ps 
and  xW  G  Vs+1,  by 


m 

&1=  E  E  P(F  =  4i)|S  =  s)^afc(x«)Q(x«,x^,Sfc) 

xCO  x(J)  SP,+1  k= 1 


(5.1) 


where  Q(x^\x.^\Rk)  denotes  the  indicator  functions  of  whether  one  can  transition  from  the  C2-state  x-b  to 
0-state  x^d  via  reaction  Rk ■  We  define  similarly  the  aggregated  transition  rate  0|,  that  the  chemical  system 
transitions  from  the  slow  variable  state  Vs  to  Vs-i  by 

m 

02=  E  E  P(F  =  4')|5  =  S)E^(xW)Q(x(i)>x0)>^)  (5-2) 

xW£p,x(i)eps_  i  fc= 1 


We  illustrate  in  Figure  5.3  the  aggregated  transition  rates  between  the  slow  state  S  =  s  and  its  adjacent  states 
S  =  s  —  1  and  S  =  s  +  1,  for  all  values  of  the  slow  variable  S.  Note  that  in  the  derivations  (5.1)  and  (5.2),  we 
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(a)  (b) 


(c) 


Fig.  5.3.  Plot  of  the  aggregated  transition 


rates  for  the  illustrative  example  CS-II:  (a)  ©i;  (b)  ©2;  and  (c)  ©1  —  ©2. 


(a) 


(b) 


Error  =  0.028239 


Error  =  0.0033892 


Fig.  5.4.  The  final  estimated  stationary  distribution  of  the  slow  variable  S  for  the  ADM-CLE  approach,  computed  without 
knowledge  of  the  slow  variable,  for  (a)  the  chemical  system  CS-I;  and  (b)  the  chemical  system  CS-II.  (blue  histograms).  Red  solid 
lines  are  exact  solutions  computed  by  solving  the  CME  of  the  full  model  and  using  the  corresponding  definition  of  the  slow  variable. 


can  either  rely  on  the  CSSA  algorithm  to  sample  from  the  conditional  distribution  of  fast  variables  given  values 
for  the  slow  variables,  as  shown  in  Section  5.1,  or  use  the  analytic  formulation  which  is  possible  to  derived  for 
both  CS-I  and  CS-II,  see  Section  5.2. 

Finally,  we  compute  the  solution  to  the  stationary  CME  associated  to  the  system  in  Figure  5.2  which  can  be 
written  as 


0  =  0*  17r(s  -  1)  +  ©2+17t(s  +  1)  -  (©1  +  ©2)  7 r(s), 


(5.3) 


where  7r(s)  ~  P(S  =  s)  is  the  probability  that  S  =  s  at  time  t.  Assuming  that  7r(s)  =  0  for  all  s  ^  5  and 
using  no-flux  boundary  conditions,  we  arrive  at  a  linear  system.  The  eigenvector  of  the  resulting  matrix,  with 
associated  eigenvalue  A  =  0,  yields  an  approximate  solution  of  the  stationary  CME,  which  we  plot  in  blue  in 
Figure  5.4.  Our  result  is  visually  indistinguishable  from  the  exact  solution  (plotted  as  the  red  solid  line). 

5.4.  A  comparison  with  the  Constrained  Multiscale  Algorithm  (CMA).  We  compare  the  approach 
we  introduced  in  the  previous  Section  5.3  with  the  CMA  method  proposed  in  [8].  We  compare  the  results  of  the 
two  methods  with  the  ground  truth,  and  record  the  error  defined  as 


Error ^7r,  P(£  =  s)j  =  ^  1 7r(s)  —  P(S  =  s)  , 

sGS 


(5.4) 


where  P(S  =  s)  denotes  the  ground  truth  probability  distribution  of  the  slow,  and  7r  denotes  the  estimated 
solution,  either  by  the  CMA  and  or  the  ADM-CLE.  As  Table  5.2  shows,  the  ADM-CLE  approach  yields  lower 
errors  compared  to  the  CMA  algorithm,  even  when  we  run  the  latter  with  the  parameter  Lc  as  large  as  20,000. 
Note  that  for  the  chemical  system  CS-I,  the  ground  truth  probability  distribution  of  the  slow  variable  P(S  =  s) 
can  be  easily  computed  using  the  multivariate  Poisson  distribution,  as  discussed  in  Section  2.1.  For  the  second 
chemical  system  CS-II,  we  consider  as  ground  truth  the  solution  obtained  by  solving  the  associated  CME  of  the 
full  model  in  a  large  (truncated)  domain. 
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Lc  =  100 

Lc  =  500 

Lc  =  2,000 

Lc  =  5, 000 

Lc  =  10, 000 

Lc  =  20, 000 

ADM-CLE 

CS-I 

0.21768 

0.089228 

0.10723 

0.045092 

0.040542 

0.066988 

0.003389 

CS-II 

0.66641 

0.49063 

0.14206 

0.070711 

0.12937 

0.10995 

0.028239 

Table  5.2 


The  distance  (as  measured  by  the  error  in  (5.4)J  between  the  estimated  and  the  ground  truth  probability  distributions  of  the 
slow  variable,  for  the  CM  A  algorithm  which  runs  the  CSS  A  algorithm  for  each  value  of  the  slow  variable  S  =  s,  until  Lc  changes 
of  the  slow  variable  occur.  The  rightmost  column  shows  the  recovery  errors  for  our  proposed  Markov-based  approach. 


Error  =  0.40385 


Error  =  0.05501 6 


Error  =  0.053468 


140  160  180  200  220  240  260 


140  160  180  200  220  240  260 


(b)  L  =  1,000 


(c)  L  =  10,  000 


Error  =  0.066556 


140  160  180  200  220  240  260 


(d)  L  =  100,  000 


Error  =  0.039577 


140  160  180  200  220  240  260 


140  160  180  200  220  240  260 


140  160  180  200  220  240  260 


140  160  180  200  220  240  260 


(e)  L  =  100 


(f)  L  =  1,000 


(g)  L  =  10,  000 


(h)  L  =  100,  000 


Fig.  5.5.  (a)— (d)  The  stationary  distribution  of  the  slow  variable  computed  by  the  CMA  for  CS-I,  using  knowledge  of  the  slow 

variable  (blue  histograms) .  The  red  solid  line  is  the  exact  solution,  P(S  =  s),  obtained  by  solving  the  CME  of  the  full  system  CS-I. 
The  CMA  approach  runs  the  CSS  A  algorithm  for  each  value  of  the  slow  variable  S  =  s  until  Lc  =  {102, 103, 104, 105}  changes  of 
the  slow  variable  occur.  Panels  (e)-(f)  show  the  CMA-computed  distribution  after  smoothing  out  by  the  Kernel  Density  Estimation 
procedure. 


In  Table  5.2  we  show  numerical  results  that  highlight  the  accuracy  improvement  of  the  ADM-CLE  approach 
compared  to  the  CMA  approach  of  [8].  For  the  latter  method,  we  run  the  CSS  A  algorithm  [8]  for  each  value  of 
the  slow  variable,  until  Lc  changes  of  the  slow  variable  occur.  As  expected,  the  accuracy  of  the  CMA  algorithm 
improves  as  Lc  increases,  at  the  cost  of  additional  computational  running  time  of  the  method.  In  comparison, 
our  stochastic  simulation  free  approach  yields  significantly  more  accurate  results,  with  errors  that  are  at  least 
one  order  of  magnitude  lower  than  the  CMA  method  with  L  =  20,  000.  We  plot  in  Figures  5.5  and  5.6  (top 
rows)  the  estimated  stationary  distribution  of  the  CMA  method  for  both  chemical  systems  considered  throughout 
this  paper,  for  several  values  of  the  L  parameter.  The  bottom  rows  of  the  same  Figures  5.5  and  5.6  show  the 
estimated  distribution,  after  smoothing  out  by  the  Kernel  Density  Estimation  (KDE). 

6.  Summary  and  discussion.  In  this  paper  we  have  introduced  an  ADM-CLE  approach  for  detecting 
intrinsic  slow  variables  in  high-dimensional  dynamic  data,  generated  by  stochastic  dynamical  systems.  In  the 
original  ADM  framework,  the  local  bursts  of  simulations  initiated  at  each  data  point  to  estimate  the  local 
covariances  are  computationally  expensive,  a  shortcoming  we  avoid  by  using  an  approximation  of  the  CLE.  A 
second  innovation  that  further  improved  the  computational  performance  relates  to  the  underlying  similarity 
graph,  a  starting  point  for  the  diffusion  map  approach.  By  exploiting  the  spectrum  of  each  local  covariance 
matrix,  we  built  a  sparse  ellipsoid-like  neighborhood  graph  at  each  point  in  the  data  set,  with  the  end  result  of 
being  able  to  build  a  sparse  similarity  graph  that  requires  the  computation  of  a  much  smaller  number  of  distances, 
which  makes  the  ADM-CLE  approach  scalable  to  networks  with  thousands  or  even  tens  of  thousands  of  nodes. 
For  the  two  illustrative  examples  considered  in  this  paper,  the  size  of  the  resulting  graphs  is  N  =  10,  201  for 
CS-I,  and  N  =  12, 100  for  CS-II,  respectively.  Had  these  graphs  been  complete  graphs,  the  number  of  resulting 
weighted  edges  would  be  over  50  million,  while  in  our  computations,  the  number  of  edges  is  approximately  2.9 
million  for  CS-I,  and  3.9  million  for  CS-II. 

We  have  proposed  a  spectral-based  method  for  inferring  the  slow  variable  present  within  the  chemical  sys- 
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(e)  L  =  100 


(f)  L  =  1,000 


(g)  L  =  10,  000  (h)  L  =  100,  000 


Fig.  5.6.  (a)— (d)  The  stationary  distribution  of  the  slow  variable  computed  by  the  CMA  for  CS-II,  using  knowledge  of  the  slow 

variable  (blue  histograms) .  The  red  solid  line  is  the  exact  solution,  P(S  =  s),  obtained  by  solving  the  CME  of  the  full  system  CS-II. 
The  CMA  approach  runs  the  CSS  A  algorithm  for  each  value  of  the  slow  variable  S  =  s  until  Lc  =  (102. 103, 104, 105}  changes  of 
the  slow  variable  occur.  Panels  (e)-(f)  show  the  CMA-computed  distribution  after  smoothing  out  by  the  Kernel  Density  Estimation 
procedure. 


tem  without  any  prior  knowledge  of  its  structure,  and  a  Markov-based  approach  for  estimating  its  stationary 
distribution.  We  augment  the  proposed  algorithmic  approach  with  numerical  simulations  that  confirm  that  the 
ADM-CLE  approach  can  compare  favorably  for  some  systems  to  the  CMA  for  estimating  the  stationary  distri¬ 
bution  of  such  slow  variables.  The  ADM-CLE  approach  can  also  be  applied  to  systems  with  a  low  number  of 
states  of  slow  variables.  The  CMA,  as  introduced  in  [8],  is  more  suitable  for  systems  where  the  slow  variable(s) 
can  take  many  different  values,  because  the  CMA  uses  an  underlying  SDE  approximation  for  the  behaviour  of 
the  slow  variables.  One  option  to  overcome  this  problem  is  to  estimate  effective  propensity  functions  of  the  slow 
subsystem  [9].  An  open  question  is  to  extend  the  ADM-CLE  to  systems  where  the  range  of  the  Xi  variables  is 
very  large.  In  the  ADM-CLE  approach  applied  to  CS-I  or  CS-II,  we  associate  a  state  (i.e. ,  node  in  the  initial 
graph)  to  each  possible  combination  of  pairs  of  states  (xi,  X2),  an  approach  no  longer  feasible  whenever  the  range 
of  the  variables  is  large.  To  bypass  this  problem,  one  could  change  the  discretization  of  the  state  space  and 
modify  accordingly  the  Markov  chain  based  approach  (Figure  5.2)  used  in  the  ADM-CLE. 

The  ADM-CLE  couples  a  method  for  finding  slow  variables  (ADM)  with  an  approach  to  compute  the  station¬ 
ary  distribution  of  a  multiscale  chemical  reaction  network.  Chemical  systems  depend  on  a  number  of  parameters 
(e.g.  kinetic  rate  constants)  and  an  open  question  is  to  extend  the  ADM  approach  to  situations  where  one  (or 
more)  parameters  are  varied,  i.e.  to  perform  bifurcation  analysis  of  multiscale  stochastic  chemical  systems  [30]. 

Finally,  we  point  out  recent  work  of  Dsilva  et  al.  [11],  who  also  rely  on  the  ADM  framework  to  discover 
nonlinear  intrinsic  variables  in  high-dimensional  data  in  the  context  of  multiscale  simulations,  with  the  task 
of  merging  different  simulations  ensembles  or  partial  observations  of  such  ensembles  in  a  globally  consistent 
manner.  Their  work  is  motivated  by  the  fact  that  often  one  is  not  merely  interested  in  extracting  the  hidden 
(slow)  variables  from  the  underlying  low-dimensional  manifold,  given  partial  observations  x'1',  i  =  1,2, . . . ,  N  as 
in  the  ADM  setting,  but  also  in  extending  high-dimensional  functions  on  a  set  of  points  lying  in  a  low-dimensional 
manifold.  Their  proposed  approach  relies  on  the  so  called  Laplacian  Pyramids  [35],  a  multiscale  algorithm  for 
extending  a  high-dimensional  function  defined  on  a  set  of  points  in  the  space  of  intrinsic  variables  to  a  second 
set  of  points  not  in  the  data  set,  by  using  Laplacian  kernels  of  decreasing  bandwidths  (the  e  parameter  in  (3.5)). 
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